Improving semantic topic clustering for search queries with word co-occurrence and bigraph co-clustering

نویسندگان

  • Jing Kong
  • Alex Scott
  • Georg M. Goerg
چکیده

Uncovering common themes from a large number of unorganized search queries is a primary step to mine insights about aggregated user interests. Common topic modeling techniques for document modeling often face sparsity problems with search query data as these are much shorter than documents. We present two novel techniques that can discover semantically meaningful topics in search queries: i) word co-occurrence clustering generates topics from words frequently occurring together; ii) weighted bigraph clustering uses URLs from Google search results to induce query similarity and generate topics. We exemplify our proposed methods on a set of Lipton brand as well as make-up & cosmetics queries. A comparison to standard LDA clustering demonstrates the usefulness and improved performance of the two proposed methods. keywords: search queries, topic clustering, word cooccurrence, bipartite graph, co-clustering

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discipline Hotspots Mining Based on Hierarchical Dirichlet Topic Clustering and Co-word Network

Discovering inherent correlations and hot research topics among various disciplines from massive scientific documents is very important to understand the scientific research tendency. The LDA (Latent Dirichlet Allocation) topic model can find topics from big data sets, but the number of topics must to be told before topic clustering. There is a lot of randomness to determine the number of topic...

متن کامل

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

Clustering Web Search Results with Maximum Spanning Trees

We present a novel method for clustering Web search results based on Word Sense Induction. First, we acquire the meanings of a query by means of a graph-based clustering algorithm that calculates the maximum spanning tree of the co-occurrence graph of the query. Then we cluster the search results based on their semantic similarity to the induced word senses. We show that our approach improves c...

متن کامل

A Comparative Study of Word Co-occurrence for Term Clustering in Language Model-based Sentence Retrieval

Sentence retrieval is a very important part of question answering systems. Term clustering, in turn, is an effective approach for improving sentence retrieval performance: the more similar the terms in each cluster, the better the performance of the retrieval system. A key step in obtaining appropriate word clusters is accurate estimation of pairwise word similarities, based on their tendency t...

متن کامل

Word clustering effect on vocabulary learning of EFL learners: A case of semantic versus phonological clustering

The aim of this study is to determine the effect of word clustering method on vocabulary learning of Iranian EFL learners through a case of semantic versus phonological clustering. To this effect, 80 homogeneous students from four intermediate classes at an English institute in Torbat e Heydariyeh participated in this research. They were assigned to four groups according to semantic versus phon...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016